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Abstract 

Ontology matching is the process of automatically determin¬ 
ing the semantic equivalences between the concepts of two 
ontologies. Most ontology matching algorithms are based on 
two types of strategies: terminology-based strategies, which 
align concepts based on their names or descriptions, and 
structure-based strategies, which exploit concept hierarchies 
to find the alignment. In many domains, there is additional 
information about the relationships of concepts represented 
in various ways, such as Bayesian networks, decision trees, 
and association rules. We propose to use the similarities be¬ 
tween these relationships to find more accurate alignments. 
We accomplish this by defining soft constraints that prefer 
alignments where corresponding concepts have the same lo¬ 
cal relationships encoded as knowledge rules. We use a prob¬ 
abilistic framework to integrate this new knowledge-based 
strategy with standard terminology-based and structure-based 
strategies. Furthermore, our method is particularly effective 
in identifying correspondences between complex concepts. 
Our method achieves substantially better F-score than the pre¬ 
vious state-of-the-art on three ontology matching domains. 


Introduction 

Ontology matching is the process of aligning two semanti¬ 
cally related ontologies. Traditionally, this task is performed 
by human experts. Since the task is tedious and error prone, 
especially in larger ontologies, there has been substantial 
work on developing automated or semi-automated ontology 
matching systems dShvaiko and Euzenat 201 it . While some 
automated matching systems make use of data instances, in 
this paper we focus on the schema-level ontology matching 
task, in which no data instances are used. 

Previous schema-level ontology matching systems mainly 
use two classes of strategies. Terminology-based strategies 
discover corresponding concepts with similar names or de¬ 
scriptions. Structure-based strategies discover correspond¬ 
ing groups of concepts with similar hierarchies. In many 
cases, additional information about the relationships among 
the concepts is available through domain models, such as 
Bayesian networks, decision trees, and association rules. A 
domain model can be represented as a collection of knowl¬ 
edge rules, each of which denotes a semantic relationship 
among several concepts. These relationships may be com¬ 
plex, uncertain, and rely on imprecise numeric values. In this 
paper, we introduce a new knowledge-based strategy which 


uses the structure of these knowledge rules as (soft) con¬ 
straints on the alignment. 

As a motivating example, consider two ontologies about 
basketball games. One has datatype properties height, 
weight and a binary property center for players, while 
the other has the corresponding datatype properties h, w, and 
position. Terminology-based strategies may not identify 
these correspondences. However, if we know that a large 
value of height implies center is true in the first ontol¬ 
ogy, and the same relationship holds for h and position 
= Center in the second ontology, then we tend to believe 
that height maps to h and center maps to position = 
Center. 


We use Markov logic networks (MLNs) 
( Domingos and Lowd 2009| as a probabilistic lan¬ 
guage to combine the knowledge-based strategy with 
other strategies, in a formalism similar to that of 
(Niepert, Meilicke, and Stuckenschmidt 2010 1 . In particu¬ 
lar, we encode the knowledge-based strategy with weighted 
formulas that increase the probability of alignments where 
corresponding concepts have isomorphic relationships. We 
use an MLN inference engine to find the most likely align¬ 
ment. We name our method Knowledge-Aware Ontology 
Matching (KAOM). 

Our approach is also capable of identifying complex 
correspondences, an extremely difficult task in ontology 
matching. A complex correspondence is a correspondence 
between a simple concept and a complex concept (e.g., 
grad.student maps to the union of PhD and Masters). 
This can be achieved by constructing a set of complex con¬ 
cepts (e.g., unions) in each ontology, subsequantly generat¬ 
ing candidate complex correspondences, and using multiple 
strategies - including the knowledge-based strategy - to find 
the correct complex correspondences. 

The contributions of this work are as follows: 


• We show how to represent common types of domain mod¬ 
els as knowledge rules, and how to use these knowledge 
rules to guide the ontology matching process, leading to 
more accurate alignments. Our approach is especially ef¬ 
fective in identifying the correspondences of numerical 
or nominal datatype properties. By incorporating com¬ 
plex concepts, our approach is also capable of discovering 
complex correspondences, which is a very difficult sce¬ 
nario in the ontology matching task. 








• We evaluate the effectiveness of KAOM in three domains 
with different types of knowledge rules, and show that 
our approach not only outperforms the state-of-the-art ap¬ 
proaches for ontology matching in one-to-one matching, 
but also discovers complex correspondences successfully. 

The rest of the paper is organized as follows. First we 
review pervious work on ontology matching. We then in¬ 
troduce the concept of “knowledge rules” with a defini¬ 
tion and examples. Next, we show our approach of using 
Markov logic to incorporate multiple strategies, including a 
knowledge-based strategy and the treatment of complex cor¬ 
respondences. Finally we present experimental results and 
conclude. 


Ontology Matching 

We begin by formally defining ontology matching. 

Definition 1 (Ontology Matching 

dEuzenat and Shvaiko 2007l l). Given two ontologies 
Oi and O 2 , a correspondence is a 3-tuple (ei, 62 , r) where 
ei and 62 are entities of the first and second ontologies 
respectively, and r is a semantic relation such as equivalence 
(=) and subsumptions (C or □). An alignment is a set of 
correspondences. Ontology matching is the task or process 
of identifying the correct semantic alignment between the 
two ontologies. In most cases, ontology matching focuses 
on equivalence relationships only. 


Most existing schema-level ontology matching sys¬ 
tems use two types of strategies; terminology-based and 
structure-based. Terminology-based strategies are based on 
terminological similarity, such as string-based or linguistic 
similarity measures. Structure-based strategies are based 
on the assumption that two matching ontologies should 
have similar local or global structures, where the structure 
is represented by subsumption relationships of classes and 
properties, and domains and ranges of properties. Advanced 
ontology matching systems often combine the two types of 
strategies(e.g., (Melnik, Garcia- Molina, and Rahm 2002) ). 
See (IShvaiko and Euzenat 201 it for a survey of 
ontology matching systems and algorithms. Re¬ 
cently, a probabilistic framework based on Markov 
logic was proposed to combine multiple strategies 
(Niepert, Meilicke, and Stuckenschmidt 2010)l. 


Definition 2 (Complex Correspondences). A complex con¬ 
cept is a composition (e.g., unions, complements) of one or 
more simple concepts. In OWLO], there are several construc¬ 
tors for creating complex classes and properties (see the top 
part of Table [1] for an incomplete list of constructors). A 
complex correspondence is an equivalence relation between 
a simple class or property and a complex class or property 
in two ontologies dRitze et al. 20081) . 


Previous work takes several different approaches to 
finding complex correspondences (i.e., complex match¬ 
ing). One is to construct candidates for complex corre¬ 
spondences using operators for primitive classes, such as 
string concatenation or arithmetic operations on numbers 


'http://www.w3.org/TR/owl2-primer/ 


dPhamankar et al. 20041) . dRitze et al. 20081 1 introduce four 
specific patterns for complex correspondences: Class by At¬ 
tribute Type (CAT), Class by Inverse Attribute Type (ClAT), 
Class by Attribute Value (CAV), and Property Chain pattern 
(PC). Einally, when aligned or overlapping data is available, 
inductive logic programming (ILP) techniques can be used 
as well dm et al. 20lTljQin, Dou, and LePendu 2007| . 

Representation of Domain Knowledge 

In the Al community, knowledge is typically represented in 
formal languages, among which ontologies and ontology- 
based (e.g., the Web Ontology Language, OWL) languages 
are the most widely used forms. The Web Ontology Lan¬ 
guage (OWL) is the W3C standard ontology language that 
describes the classes, properties and relations of objects in a 
specific domain. 

OWL and many other ontology languages are based on 
variations of description logics. However, the choice of us¬ 
ing description logic as the foundation of the Semantic Web 
ontology languages is largely due to the trade-off between 
expressivity and reasoning efficiency. In tasks such as on¬ 
tology matching, reasoning does not need to be instant, so 
we can afford to consider more general forms of knowledge 
outside of a specific ontology language or description logic. 

Definition 3 (Knowledge Rule). A knowledge rule is a sen¬ 
tence R{a, 0) in a formal language which consists of 
a relation R, a set of entities (i.e., classes, attributes or re¬ 
lations) {a, b,.. .}, and (optionally) a set of parameters 9. 
A knowledge rule carries logical or probabilistic semantics 
representing the relationship among these entities. The spe¬ 
cific semantics depend on R. 

Many domain models and other types of knowledge can 
be represented as sets of knowledge rules, each rule describ¬ 
ing the relationship of a small number of entities. The se¬ 
mantics of each relationship R can typically be expressed 
with a formal language. Table [T] shows some examples of 
the symbols used in formal languages such as description 
logic, along with their associated semantics. 


Table 1: Syntax and semantics of DL symbols (top), DL ax¬ 
ioms (middle), and other knowledge rules used in the exam¬ 
ples of the paper (bottom) 


Syntax 

Semantics 

COD 

c-^nD-^ 

CUD 

C'^UD^ 


V\C^ 

3R.C 

{x e V\3y{{x, y) eR^ Ay £ C^)} 

RoS 

{{x,y)\Bz{{x,z) £ R^ A {z,y) £ 5''^)} 

R[C 

{(a:,t/) € R^\y £ C^} 

CUD 

C '^ C D ^ 

CU^D 

C^nD^ = 9 

R^S 

y < y' for V(a;, y) £ R^ A {x, y') G S-‘' 

C=>D 

¥x{D^\C^) is close to 1 


We illustrate a few forms of knowledge rules with the fol¬ 
lowing examples. 



























Example 1. The submission deadline precedes the camera 
ready deadline: paperDueOn ^ manuscriptDueOn This is 
represented as i?i(paperDueOn, manuscriptDueOn) with 
i?i(a, 6) : a -< 6. 

Example 2. A basketball player taller than 81 inches and 
heavier than 245 pounds is likely to be a center: h > 
81 A w > 245 pos = Center This rule can be 
viewed as a branch of a decision tree or an association rule. 
It can be represented as ii 2 (h, w,pos=Center, [81,245]), 
with i? 2 (a, 6, c, 0) : o > 01 A 6 > 02 => c. 

Example 3. A smoker’s friend is likely to be a smoker as 
well: Smokes(a;) A Friend(a;, ?/) => Smokes( 2 /) This rule 
can be represented as i? 3 (Smoke, Friend) with i? 3 (a, b) : 
a{x) A b{x,y) a{y). 


Knowledge Aware Ontology Matching 

In this section, we present our approach. Knowledge Aware 
Ontology Matching (KAOM). KAOM uses Markov logic 
networks ([Domingos and Lowd 2009| to solve the ontol¬ 
ogy matching task. The MLN formulation is similar to 


([Niepert, Meilicke, and Stuckenschmidt 2010 1 but incorpo¬ 


rates the knowledge-based matching strategy and treatment 
of complex correspondences. 

In the ontology matching problem, we represent a cor¬ 
respondence with a binary relation, match(a, a'), which is 
true if concept a from the first ontology is semantically 
equivalent to concept a' from the second ontology (e.g., 
match(writePaper, writes) means writePaper = 
writes). 

We define three components of the MLN of the ontology 
matching problem: constants, evidence and formulas. The 
logical constants are the entities in both ontologies. The evi¬ 
dence includes the complete set of OWL-supported relation¬ 
ships among all concepts in each ontology (e.g., subsump¬ 
tions and disjointness, we use an OWL reasoner to create 
the complete set of OWL axioms.), and rules converted to 
first-order atoms as described in the previous section. 

For the formulas, we begin with a set of formulas adapted 


from ([Niepert, Meilicke, and Stuckenschmidt 2010|. The 
numbers preceding the formulas are the weights. A missing 
weight means a formula with infinity weight. 

1. A-priori similarity is the string similarity between all 
pairs of concepts: 


Sa,a’ match(a, o') (1) 

where Sa,a' is the string similarity between a and a', 
which also serves as the weight of the formula. We use 
the Levenshtein measure dLevenshtein 1966T l for simple 
correspondences. This atomic formula increases the prob¬ 
ability of matching pairs of concepts with similar strings, 
all other things being equal. 

2. Cardinality constraints enforce one-to-one simple (or 
complex) correspondences: 


3. Coherence constraints enforce consistency of subclass 
relationships: 

a C 6 A a' C -'b' =>- -i(match(a, a') A match(6, b')) 

(3) 


Knowledge-based Strategy 

We propose a new knowledge-based strategy for ontol¬ 
ogy matching that uses the similarity of knowledge rules 
in the two ontologies. It is inspired by the structure- 
based strategy in many ontology matching algorithms (e.g., 
( [Melnik, Garcia-Molina, and Rahm 2002) ). The strategy fa¬ 
vors the alignments that preserve the same types of knowl¬ 
edge rules, which extends the subsumption relationship of 
entities in structure-based strategies to sub-property, disjoint 
properties, and user-defined relations such as ordering of 
dates, and non-deterministic relationships such as correla¬ 
tion and anti-correlation. The strategy can be represented as 
the Markov logic formulas: 

-Wk Rk{a,b,...) A-<Rk{a',b',...) ^ 

match(a, a') A match(6, b') A ..., k = 1 ,..., m (4) 

-l-tUfc Rk{a,b,...) A Rk{a ,b',...) ^ 

match(a, a') A match(6, b') A ..., k = 1 ,..., m (5) 

where Rk is a rule pattern. 

Example 4. A reviewer of a paper cannot be 
the paper’s author. In the cmt ontology we have 
i? 4 (writePaper, readPaper) and in the confOf ontology 
we have ii 4 (write, reviews) where Ri{a,b) : a C -ifc 
is the disjoint relationship of properties. Applying the 
constraint formulas defined above, we increase the score of 
all alignments containing the two correct correspondences: 
writePaper = writes and readPaper = reviews. 

Rules involving continuous numerical attributes often in¬ 
clude parameters (e.g., thresholds in Example |2| that do 
not match between different ontologies. In order to ap¬ 
ply the knowledge-based strategy to numerical attributes, 
we make the assumption that corresponding numerical at¬ 
tributes roughly have a positive linear transformation. This 
assumption is often true in real applications, for instance, 
when an imperial measure of height matches to a metric 
measure of height. 

We propose two methods to handle numerical attributes. 
The first method is to compute a distance measure (e.g., 
Kullback-Leibler divergence) between the distributions of 
the corresponding attributes in a candidate alignment. 
Specifically, we replace Formulas|4]and|5]with: 

do — d match(a, o') A match(6,6')..., A: = 1,..., TO (6) 

where d is a distance measure of the two rules Rk{a, b,...) 
and R'kia', b ',...) and do is a threshold. 

Example 5. In the nba-os ontology, we have conditional 
rules converted from a decision tree, such as 


( 2 ) 


h > 81 A w > 245 => Center 


match(a, a) A match(a, a") ^ a = a" 















Similarly, in the nbayahoo ontology, we have 

h' >2.06 Aw' > 112.5 ^ Center' 

Here the knowledge rules represent the conditional distribu¬ 
tions of multiple entities. We define the distance between the 
two conditional distributions as 

C?(h ,w, Center;h',w',Center') 
=®'p(h,w)'^(p(Center|h,w)||p(Center' |h' ,w')) 

where ]E(-) is expectation and d{p\\p') is a distance mea¬ 
sure (Because Center and Center' are binary attributes, 
we simply use \p — p'\ as the distance measure. For nu¬ 
merical attributes, we can use the difference of two distri¬ 
bution histograms as the distance measure). We assume the 
attribute correspondences (h and h', w and w') are linear 
mappings, and the linear relation can be roughly estimated 
(e.g., by matching the minimum and maximum numbers in 
these rules). When computing the expectation over h and 
w, we apply the linear mapping to generate corresponding 
values of h' and w' . The distribution of the conditional at¬ 
tributes p(h, w) can be roughly estimated as independent and 
uniform over the ranges of the attributes. 

The second method for handling continuous attributes is 
to discretize them, reducing the continuous attribute prob¬ 
lem to the discrete problem described earlier. For exam¬ 
ple, suppose each continuous attribute x is replaced with 
a discrete attribute indicating the quartile of x rather 
than its original value. Then we have w'^, Center) 

and i? 5 (h' w' Center' ) with relation i? 5 (a, b,c) : a = 
4 A 6 = 4 c, and the discrete value of 4 indicates that both 
a and b are in the top quartile. Other discretization methods 
are also possible, as long as the discretization is done the 
same way (e.g., equal-width) in both domains. 

Our method does not rely on the forms of knowledge 
rules, nor does it rely on the algorithms used to learn these 
rules. As long as similar data mining techniques or tools are 
used on both sides of ontologies, we would always be able 
to find interesting knowledge-based similarities between the 
two ontologies. 

Complex Correspondences 

Our approach can also find complex correspondences, which 
contain complex concepts in either or both of the ontologies. 
We add the complex concepts into consideration and treat 
them the same way as simple concepts, and all the simple 
and complex correspondences will be solved jointly by con¬ 
sidering terminology, structure, and knowledge-based strate¬ 
gies in a single probabilistic formulation. 

First, because complex concepts are recursively defined 
and potentially infinite, we need to select a finite subset of 
complex concepts and use them to generate the candidate 
correspondences. We will only include the complex con¬ 
cepts occurring in the ontology axioms or in the knowledge 
rules. 

Second, we need to define a string similarity measure 
for each type of complex correspondence. For example, 
(IRitze et al. 20081 requires two conditions for a Class by 
Attribute Type (CAT) matching pattern Oi : a = 02' 


3p.b (e.g., a = Accepted-Paper, p = hasDecision, b 
=Acceptance): a and b are terminologically similar, and 
the domain of p (Paper in the example) is a superclass of a. 
We can therefore define the string similarity of a and 3p.b to 
be the string similarity of a and b which coincides with the 
first condition, and the second condition is encoded in the 
structure stability constraints. The string similarity measure 
of many other types of correspondences can be defined sim¬ 
ilarly based on the heuristic method in (IRitze et al. 20081) . 

If there does not exist a straight-forward way to define the 
string similarity for a certain type of complex correspon¬ 
dences, we can simply set it to 0 and rely on other strategies 
to identify such correspondences. 

Lastly, we need constraints for the correspondence of 
two complex concepts. The corresponding component con¬ 
cepts and same constructor always implies the correspond¬ 
ing complex concepts, while in the other direction, it is a soft 
constraint. 

match(a, a') A match(6, b') A ... ^match(c, c') 
w^. match(a, a') A match(5, b') A ... <^=match(c, c') 

where c = conSfe(a, 6,...), c' = conSfc(o', 5',...) for each 
constructor cons^ (e.g., union, 3p.b). 

Some complex correspondences are almost impossible to 
be identified with traditional strategies. With the knowledge- 
based strategy, it becomes possible. 

Example 6. A reviewer of a paper cannot be the paper’s 
author. In the cmt ontology we have 

writePaper C —ireadPaper 
and in the conference ontology we have 

contributes [ Reviewed-contribution 
C —i(contributes o reviews) 

We first build two complex concepts contributes [ 
Reviewed-contribution and contributes o reviews. 
With i? 4 (a, 6) = a Q -'b (disjoint properties), the score 
function would favor the correspondences 

writePaper = contributes 1. Reviewed-contribution 
readPaper = contributes o reviews 

Experiments 

We test our KAOM approach on three domains: NBA, cen¬ 
sus and conference. The sizes of the ontologies of these do¬ 
mains are listed in Table|2] These domains contain very dif¬ 
ferent forms of ontologies and knowledge rules, so we can 
examine the generality and robustness of our approach. 

We use Pellet (ISirin et al. 2007i for logical inference of 
the ontological axioms and TheBeas0 (IRiedel 20081 1 and 
Rocki{3 ( |Noessner, Niepert, and Stuckenschmidt 2013| ) for 
Markov logic inference. We compare our system (KAOM) 
with three others: KAOM without the knowledge-based 

'http://code.google.com/p/thebeast/ 
'https://code.google.eom/p/rockit/ We use 
Rockit for the census domain because TheBeast is not able to 
handle the large number of rules in that domain. 













Table 2: Number of classes (#c), object properties (#o), data 
properties (#d) and nominal values (#v) of each ontology 
used in the experiments. 


domain 

ontology 

#c 

#o 

#d 

#v 

MP A 

nba-os 

3 

3 

20 

3 


yahoo 

4 

4 

21 

7 


adult 

1 

0 

15 

101 

census 

income 

1 

0 

12 

97 


cmt 

36 

50 

10 

0 


confOf 

38 

13 

25 

0 

OntoFarm 

conference 

60 

46 

18 

6 


edas 

103 

30 

20 

0 


ekaw 

78 

33 

0 

0 


strategy (MLOM), CODI (IHuber et al. 201 il l (a new version 
of ( Niepert, Meilicke, and Stuckenschmidt 2010| ), which is 
essentially a different implementation of MLOM), and 
logmap2 Pimenez-Ruiz, Grau, and Zhou 2012)1 , a top per¬ 
forming system in OAEI 201415 

We manually specify the weights of the Markov 
logic formlas in KAOM and MLOM. The weights 
of stability constraints for subclass relationships 
are set with values same as the ones used in 
( [Niepert, Meilicke, and Stuckenschmidt 2010| l, i.e., the 

weight for subclass is -0.5, and those for sub-domain and 
range are -0.25. In KAOM, we also set the weights for 
different types of similarity rules based on our assessment 
of their relative importance and kept these weights fixed 
during the experiments. 


NBA 

The NBA domain is a simple setting that we use to demon¬ 
strate the effectiveness of our approach. We collected data 
from the NBA official website and the Yahoo NBA website. 
For each ontology, we used the WinMine toolkit to learn a 
decision tree for each attribute using the other attributes as 
inputs. 

For each pair of conditional distributions based on deci¬ 
sion tree with up to three attributes, we calculate their sim¬ 
ilarity based on the distance measure described in Exam¬ 
ple |5] We use the Markov logic formula (|6]l with the thresh¬ 
old do = 0.2. To make the task more challenging, we did not 
use any name similarity measures. Our method successfully 
identified the correspondence of all the numerical and nomi¬ 
nal attributes, including height, weight and positions (center, 
forward and guard) of players. In contrast, without a name 
similarity measure, no other method can solve the matching 
problem at all. 


Census 

We consider two census datasets and their ontologies from 
UC Irvine data repositor}0. Both datasets represent census 
data but are sampled and post-processed differently. These 

^http://oaei.ontologymatching.org/2014/ 

^http://research.microsoft.com/en-us/um/people/ 
^https://archive.ics.uci.edu/ml/datasets.html 


two census ontologies are flat with a single concept but many 
datatype properties and nominal values. For this domain, we 
use association rules as the knowledge. We first discretize 
each numerical attribute into five intervals, and then gen¬ 
erate association rules for each ontology using the Apriori 
algorithm with a minimum confidence of 0.9 and minimum 
support of 0.001. For example, one generated rule is; 

age=' (-inf-25.5]' education='11th' 
hours-per-week= ' (-inf-35.5]' 

=> gross-income='<=50K' conf:(1) 

This is represented as 

i?6(age‘^, 11th, hours-per-week'^, gross-income^) 

where x'^ refers to the discretized value of x, split into one 
fifth percentile intervals, and RQ{a,b,c,d) : a = 1 A 6 A c = 
1 ^ d = 1. For scalability reasons, we consider up to three 
concepts in a knowledge rule, i.e., association rules with up 
to three attributes. We set the weight of knowledge similarity 
constraints for the association rules to 0.25. 



Figure 1; Precision, recall and FI on the census domain as a 
function of the string similarity threshold r. 

In the Markov logic formulation in 
( [Niepert, Meilicke, and Stuckenschmidt 2010) l, only the 
correspondences with apriori similarity measure larger than 
a threshold r are added as evidence. In the experiments, we 
set T with different values from 0.50 to 0.90. When r is 
large, we deliberately discard the string similarity informa¬ 
tion for some correspondences. MLOM for this task is an 
extension of ( [Niepert, Meilicke, and Stuckenschmidt 20101 
by adding correspondences of nominal values and their de¬ 
pendencies with the related attributes. The results are shown 
in Figure [T] We can see that KAOM always gets better 
recall and FI, with only a slight degradation in precision. 
This means our approach fully leverages the knowledge rule 
information and thus does not rely too much on the names of 
the concepts to determine the mapping. For example, when 
T is 0.70, KAOM finds 6 out of 8 correspondences of values 
of adult:workclass and income:class_of_worker, 
while MLOM finds none. The other two systems were not 
designed for nominal value correspondences, but CODI 
only identified 7 and logmap2 only identified 3 attribute 
doiH'BS'pointtences^TWbilelcKAOMiand MLOM find all the 12 
attribute correspondences. 




































Recall 


Figure 2: Precision, recall and FI on the Figure 3; Precision, recall and FI on the Figure 4: Precision-recall curve on the 
OntoFarm domain with only the one-to- OntoFarm domain with the complex cor- OntoFarm domain with the complex cor- 
one correspondences. respondences. respondences. 


OntoFarm 

In order to show how our system can use manually created 
expert knowledge bases, we use OntoFarm, a standard on¬ 
tology matching benchmark for an academic conference do¬ 
main as the third domain in our experiments. As part of 
OAEl, it has been widely used in the evaluation of ontol¬ 
ogy matching systems. The process of manually knowledge 
rule creation is time consuming, so we only used 5 of the 
OntoFarm ontologies (cmt, conference, confOf, edas, 
ekaw). Using their knowledge of computer science confer¬ 
ences and the structure of just one ontology, two individuals 
listed a number of rules (e.g.. Example [Til. We then trans¬ 
lated these rules into each of the five ontologies. Thus, the 
same knowledge was added to each of the ontologies, but its 
representation depended on the specific ontology. For some 
ontologies, some of the rules were not representable with the 
concepts in them and thus had to be omitted. This manually 
constructed knowledge base was developed before running 
any experiments and kept fixed throughout our experiments. 
Among the 5 ontologies, we have 10 pairs of matching tasks 
in total. We set r to 0.70, and the weight for the knowledge 
similarity constraints to 1.0. 

We first compare the four methods to the reference one- 
to-one alignment from the benchmark (Figure |2]i. KAOM 
achieves similar precision and FI, and better recall than 
other systems. It was able to identify correspondences in 
which the concept names are very different, for instance, 
cmt: readPaper = conf Of: reviews. Note that the simi¬ 
larity constraints work in concert with other constraints. For 
instance, in Example |4l since disjointness is a symmetric 
knowledge rule, domain and range constraints could be help¬ 
ful to identify whether cmt rwritePaper should match to 
confOf:writes or confOf:reviews. 

To evaluate our approach on complex correspondences, 
we extended the reference alignment with hand-labeled 
complex correspondences (EigurejSjl. MLOM does not per¬ 
form well in this task because the complex correspon¬ 
dences require a good similarity measure to become candi¬ 
dates (such as the linguistic features in (IRitze et al. 2008U . 
KAOM, however, uses the structure of the rules to find many 
complex correspondences without relying on complex sim¬ 
ilarity measures. Eor this task we also tried learning the 


weights of the formulas Q (KAOM-learn). Eor each of the 
10 pairs of ontologies, we used the rest 9 pairs as training 
data. KAOM-learn performs slightly better than KAOM. 

With the hand-picked or automatically learned weights, 
KAOM produces a single most-likely alignment. However, 
we can further tune KAOM to produce alignments with 
higher recall or higher precision. We accomplish this by 
adding the MEN formula match(a, o') with weight w. When 
w is positive, alignments with more matches are more likely, 
and when w is negative, alignments with fewer matches are 
more likely (all other things being equal). We adjusted this 
weight to produce the precision-recall curve shown in Eig- 
ure H] KAOM dominates CODI and provides much higher 
recall values than logmap2, although logmap2’s best preci¬ 
sion remains slightly above KAOM’s. 

Conclusion 

We proposed a new ontology matching algorithm KAOM. 
The key component of KAOM is the knowledge-based strat¬ 
egy, which is based on the intuition that ontologies about 
the same domain should contain similar knowledge rules, 
in spite of the different terminologies they use. KAOM is 
also capable of discovering complex correspondences, by 
treating complex concepts the same way as simple ones. 
We encode the knowledge-based strategy and other strate¬ 
gies in Markov logic and find the best alignment with its 
inference tools. Experiments on the datasets and ontologies 
from three different domains show that our method effec¬ 
tively uses knowledge rules of different forms to outperform 
several state-of-the-art ontology matching methods. 
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